S&P 500 Latest Stocks Analysis¶
=====================================
In this notebook, we will delve into the most recent 2019-2024 stocks from the index, focusing on:
Analysis Breakdown¶
- 30%: Most Traded Average Stocks
- Analyzing the most frequently traded stocks in recent years
- 70%: Top 6 Tech Stocks
- In-depth analysis of the top 6 tech stocks from the index, including:
- Technical Analysis
- Comparative Analysis
- Overall and most similar stocks
- Fundamental Analysis
- Quantitative Analysis
- Building a predictive model
- In-depth analysis of the top 6 tech stocks from the index, including:
In [2]:
import pandas as pd
from datetime import datetime, timedelta
import warnings
import yfinance as yf
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from scipy import stats
import matplotlib.dates as mdates
from matplotlib.colors import LinearSegmentedColormap
from plotly import tools
import plotly.tools as tls
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, plot, iplot
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [3]:
df1=pd.read_csv(r"C:\Users\PMLS\Downloads\sAND P TOP 5000 COAMPNIES NAMES\sp500_stocks.csv")
df1.head()
Out[3]:
| Date | Symbol | Adj Close | Close | High | Low | Open | Volume | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2010-01-04 | MMM | 44.016724 | 69.414719 | 69.774246 | 69.122070 | 69.473244 | 3640265.0 |
| 1 | 2010-01-05 | MMM | 43.741020 | 68.979935 | 69.590302 | 68.311035 | 69.230766 | 3405012.0 |
| 2 | 2010-01-06 | MMM | 44.361343 | 69.958191 | 70.735786 | 69.824417 | 70.133781 | 6301126.0 |
| 3 | 2010-01-07 | MMM | 44.393166 | 70.008362 | 70.033447 | 68.662209 | 69.665550 | 5346240.0 |
| 4 | 2010-01-08 | MMM | 44.705982 | 70.501671 | 70.501671 | 69.648827 | 69.974915 | 4073337.0 |
Data Cleaning¶
In [3]:
df1['Date'] = pd.to_datetime(df1['Date'])
now_time = datetime.now()
five_years_ago = now_time - timedelta(days=5*365)
df = df1[df1['Date'] >= five_years_ago]
df = df.reset_index(drop=True)
df
Out[3]:
| Date | Symbol | Adj Close | Close | High | Low | Open | Volume | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2019-09-16 | MMM | 115.520706 | 141.864548 | 143.152176 | 140.392975 | 143.152176 | 2941323.0 |
| 1 | 2019-09-17 | MMM | 114.431313 | 140.526749 | 140.944809 | 139.214050 | 140.267563 | 3248814.0 |
| 2 | 2019-09-18 | MMM | 114.002403 | 140.000000 | 140.510040 | 138.168900 | 140.142136 | 2507653.0 |
| 3 | 2019-09-19 | MMM | 113.750465 | 139.690628 | 141.145493 | 139.556854 | 139.958191 | 1941108.0 |
| 4 | 2019-09-20 | MMM | 113.539391 | 139.431442 | 141.446487 | 139.197327 | 139.991638 | 4423884.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 629248 | 2024-08-28 | ZTS | 182.910004 | 182.910004 | 184.610001 | 182.100006 | 182.100006 | 1485400.0 |
| 629249 | 2024-08-29 | ZTS | 182.889999 | 182.889999 | 184.710007 | 182.479996 | 183.910004 | 1359900.0 |
| 629250 | 2024-08-30 | ZTS | 183.490005 | 183.490005 | 184.220001 | 180.979996 | 183.089996 | 2328500.0 |
| 629251 | 2024-09-03 | ZTS | 183.570007 | 183.570007 | 184.750000 | 182.460007 | 182.710007 | 1680900.0 |
| 629252 | 2024-09-04 | ZTS | 187.330002 | 187.330002 | 187.500000 | 181.585007 | 183.190002 | 1846912.0 |
629253 rows × 8 columns
In [ ]:
# Ensure 'Date' column is in datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Define the start and end dates
start_date = pd.to_datetime('2019-09-09')
end_date = pd.to_datetime('2024-09-04')
# Generate a date range from start_date to end_date
full_date_range = pd.date_range(start=start_date, end=end_date)
# Extract the unique dates from the 'Date' column in df_last_5_years
available_dates = pd.to_datetime(df['Date'].unique())
# Find the missing dates by comparing the full range with available dates
missing_dates = full_date_range.difference(available_dates)
# Get the count of missing dates
missing_count = len(missing_dates)
# Print the count of missing dates
print(f"Count of Missing Dates: {missing_count}")
# Print each missing date in a human-readable format
for missing_date in missing_dates:
print(f"{missing_date.strftime('%d %B %Y')} is missing")
Continue with our Work For now on¶
In [5]:
df=df.rename(columns={"Symbol":"ticks"})
new_df= df.dropna()
new_df.isnull().sum()
Out[5]:
Date 0 ticks 0 Adj Close 0 Close 0 High 0 Low 0 Open 0 Volume 0 dtype: int64
In [6]:
new_df
Out[6]:
| Date | ticks | Adj Close | Close | High | Low | Open | Volume | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2019-09-16 | MMM | 115.520706 | 141.864548 | 143.152176 | 140.392975 | 143.152176 | 2941323.0 |
| 1 | 2019-09-17 | MMM | 114.431313 | 140.526749 | 140.944809 | 139.214050 | 140.267563 | 3248814.0 |
| 2 | 2019-09-18 | MMM | 114.002403 | 140.000000 | 140.510040 | 138.168900 | 140.142136 | 2507653.0 |
| 3 | 2019-09-19 | MMM | 113.750465 | 139.690628 | 141.145493 | 139.556854 | 139.958191 | 1941108.0 |
| 4 | 2019-09-20 | MMM | 113.539391 | 139.431442 | 141.446487 | 139.197327 | 139.991638 | 4423884.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 629248 | 2024-08-28 | ZTS | 182.910004 | 182.910004 | 184.610001 | 182.100006 | 182.100006 | 1485400.0 |
| 629249 | 2024-08-29 | ZTS | 182.889999 | 182.889999 | 184.710007 | 182.479996 | 183.910004 | 1359900.0 |
| 629250 | 2024-08-30 | ZTS | 183.490005 | 183.490005 | 184.220001 | 180.979996 | 183.089996 | 2328500.0 |
| 629251 | 2024-09-03 | ZTS | 183.570007 | 183.570007 | 184.750000 | 182.460007 | 182.710007 | 1680900.0 |
| 629252 | 2024-09-04 | ZTS | 187.330002 | 187.330002 | 187.500000 | 181.585007 | 183.190002 | 1846912.0 |
621845 rows × 8 columns
In [7]:
new_df.columns = new_df.columns.str.lower()
In [8]:
mean_dict = {}
# find average of volume traded over a period of time using for loops
for key in new_df['ticks'].unique():
value = new_df[new_df['ticks'] == key ]['volume'].mean()
mean_dict[key]= value
print("Length of the mean of ticks dictionary:", len(mean_dict))
# convert dict to pandas dataframe
avaerage_s = pd.Series(mean_dict)
top10_s = avaerage_s.sort_values(ascending=False)[:10]
print("Top 10 company tickers with highest average traded stock volume:\n", top10_s.index)
Length of the mean of ticks dictionary: 503 Top 10 company tickers with highest average traded stock volume: Index(['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL'], dtype='object')
- F is Ford Motor Company
- T is AT&T Inc.
In [9]:
def subdataframe(df, tick):
# top 10 ticks
ticks = list(top10_s.index)
assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
ndf = new_df[new_df['ticks'] == tick]
return ndf
In [10]:
sp500Cmp_info_df = pd.read_csv('C:/Users/PMLS/Downloads/sAND P TOP 5000 COAMPNIES NAMES/sp500_companies.csv')
Top10HigestTradeNVD=['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']
info_df=pd.DataFrame({'tick':Top10HigestTradeNVD})
info_df
Out[10]:
| tick | |
|---|---|
| 0 | NVDA |
| 1 | TSLA |
| 2 | AAPL |
| 3 | AMZN |
| 4 | F |
| 5 | AMD |
| 6 | BAC |
| 7 | T |
| 8 | AAL |
| 9 | CCL |
In [11]:
Final_info = pd.merge(info_df, sp500Cmp_info_df, how='left', left_on='tick', right_on='Symbol')
Final_info =Final_info[['tick','Shortname','Sector','Industry']]
Final_info
Out[11]:
| tick | Shortname | Sector | Industry | |
|---|---|---|---|---|
| 0 | NVDA | NVIDIA Corporation | Technology | Semiconductors |
| 1 | TSLA | Tesla, Inc. | Consumer Cyclical | Auto Manufacturers |
| 2 | AAPL | Apple Inc. | Technology | Consumer Electronics |
| 3 | AMZN | Amazon.com, Inc. | Consumer Cyclical | Internet Retail |
| 4 | F | Ford Motor Company | Consumer Cyclical | Auto Manufacturers |
| 5 | AMD | Advanced Micro Devices, Inc. | Technology | Semiconductors |
| 6 | BAC | Bank of America Corporation | Financial Services | Banks - Diversified |
| 7 | T | AT&T Inc. | Communication Services | Telecom Services |
| 8 | AAL | American Airlines Group, Inc. | Industrials | Airlines |
| 9 | CCL | Carnival Corporation | Consumer Cyclical | Travel Services |
In [ ]:
In [12]:
# Update company dict for your top 10 companies
company_dict = {
'NVDA': 'NVIDIA',
'TSLA': 'Tesla',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'F': 'Ford Motor Company',
'AMD': 'Advanced Micro Devices',
'BAC': 'Bank of America',
'T': 'AT&T',
'AAL': 'American Airlines',
'CCL': 'Carnival Corporation'
}
# Create sub-dataframes for each company
nvda_df = subdataframe(new_df, 'NVDA')
tsla_df = subdataframe(new_df, 'TSLA')
aapl_df = subdataframe(new_df, 'AAPL')
amzn_df = subdataframe(new_df, 'AMZN')
f_df = subdataframe(new_df, 'F')
amd_df = subdataframe(new_df, 'AMD')
bac_df = subdataframe(new_df, 'BAC')
t_df = subdataframe(new_df, 'T')
aal_df = subdataframe(new_df, 'AAL')
ccl_df = subdataframe(new_df, 'CCL')
# Define the function to calculate daily return and company name
def dailyfunc(df):
df['daily return'] = ((df['close'] - df['open']) / df['open']) * 100
df.style.format('{:.2f}%', subset='daily return')
df['daily_mean'] = (df['open'] + df['close'] + df['high'] + df['low']) / 4
df['co_name'] = company_dict[df['ticks'].unique()[0]]
return df
# Apply dailyfunc to each company's dataframe
nvda_df = dailyfunc(nvda_df)
tsla_df = dailyfunc(tsla_df)
aapl_df = dailyfunc(aapl_df)
amzn_df = dailyfunc(amzn_df)
f_df = dailyfunc(f_df)
amd_df = dailyfunc(amd_df)
bac_df = dailyfunc(bac_df)
t_df = dailyfunc(t_df)
aal_df = dailyfunc(aal_df)
ccl_df = dailyfunc(ccl_df)
# Print the start and end date for each company One more for Confioatnos
print('\t\tStart Date\t\t\t\t\tEnd Date')
print(f"NVDA\t\t{nvda_df['date'].min()}\t\t\t{nvda_df['date'].max()}")
print(f"TSLA\t\t{tsla_df['date'].min()}\t\t\t{tsla_df['date'].max()}")
print(f"AAPL\t\t{aapl_df['date'].min()}\t\t\t{aapl_df['date'].max()}")
print(f"AMZN\t\t{amzn_df['date'].min()}\t\t\t{amzn_df['date'].max()}")
print(f"F\t\t{f_df['date'].min()}\t\t\t{f_df['date'].max()}")
print(f"AMD\t\t{amd_df['date'].min()}\t\t\t{amd_df['date'].max()}")
print(f"BAC\t\t{bac_df['date'].min()}\t\t\t{bac_df['date'].max()}")
print(f"T\t\t{t_df['date'].min()}\t\t\t{t_df['date'].max()}")
print(f"AAL\t\t{aal_df['date'].min()}\t\t\t{aal_df['date'].max()}")
print(f"CCL\t\t{ccl_df['date'].min()}\t\t\t{ccl_df['date'].max()}")
Start Date End Date NVDA 2019-09-16 00:00:00 2024-09-04 00:00:00 TSLA 2019-09-16 00:00:00 2024-09-04 00:00:00 AAPL 2019-09-16 00:00:00 2024-09-04 00:00:00 AMZN 2019-09-16 00:00:00 2024-09-04 00:00:00 F 2019-09-16 00:00:00 2024-09-04 00:00:00 AMD 2019-09-16 00:00:00 2024-09-04 00:00:00 BAC 2019-09-16 00:00:00 2024-09-04 00:00:00 T 2019-09-16 00:00:00 2024-09-04 00:00:00 AAL 2019-09-16 00:00:00 2024-09-04 00:00:00 CCL 2019-09-16 00:00:00 2024-09-04 00:00:00
- Here Our Start Data and End Data for all the Stocks are Correct , Ensuring Data Validation
In [13]:
def plot_closing_stock_prices(dfs, ncols=2):
# Calculate the number of rows needed
nrows = (len(dfs) + ncols - 1) // ncols
# Create the figure and axes for the subplots
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15, 5 * nrows), facecolor='#f4f4f4')
fig.subplots_adjust(hspace=0.4, wspace=0.3)
for ax, df in zip(axes.flat, dfs):
# Calculate the highest stock price and the corresponding date
high = df['close'].max()
datetime = df[df['close'] == high]['date'].values[0]
# Define color based on the stock ticker
if df['ticks'].unique()[0] in ['GE', 'F']:
facecolor = '#ed615c'
else:
facecolor = '#4bd659'
# Plot the data
ax.plot(df['date'], df['close'], color='#0f2113')
ax.set_title(f"{df['co_name'].unique()[0]} Stock Price", fontsize=16, fontweight='bold')
ax.set_xlabel("Date", fontsize=12)
ax.set_ylabel("Daily Closing Stock Price", fontsize=12)
ax.set_facecolor('#ffffff') # Set background color for each subplot
# Add annotation for the highest stock price
ax.annotate(
f"All time high price during\nfive year period\nwas ${high:.2f}",
xy=(datetime, high),
xytext=(datetime, high - 0.1 * high),
bbox=dict(boxstyle="round", facecolor='#f5d3bf', edgecolor='#d0d5db'),
arrowprops=dict(facecolor='#f0190a', headlength=25, shrink=0.1)
)
# Hide any unused subplots
for ax in axes.flat[len(dfs):]:
ax.set_visible(False)
plt.show()
# Call the function with your dataframes
plot_closing_stock_prices([nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df])
- Declining and Rising Comapanies Properly
- Some better backgorund colors
In [14]:
def trade_vol(df):
# Set the style and color palette using Seaborn
sns.set(style="whitegrid")
palette = sns.color_palette("pastel") # Light, pastel color palette
# x and y coords for average trade volume
ave_x = df['date'].mean()
ave_y = df['volume'].mean()
# y coord for max trade volume
max_y = df['volume'].max()
# y coord for min trade volume
min_y = df['volume'].min()
# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))
# Plot the trade volume with a smooth line
ax.plot(df['date'], df['volume'], color=palette[0], linewidth=2.5)
# Set the title and labels with a modern font
ax.set_title(f"{df['co_name'].unique()[0]} Stock Trade Volume", fontsize=22, fontweight='bold')
ax.set_xlabel("Date", fontsize=16)
ax.set_ylabel("Daily Trade Volume", fontsize=16)
# Add horizontal lines for max, min, and average trade volumes
ax.axhline(y=max_y, linestyle='--', lw=2.5, color=palette[1])
ax.axhline(y=min_y, linestyle='--', lw=2.5, color=palette[2])
ax.axhline(y=ave_y, linestyle='--', lw=2.5, color=palette[3])
# Add vertical line for maximum trade volume date
ax.axvline(x=df[df['volume'] == max_y]['date'].values[0], linestyle='--', lw=2.5, color=palette[4])
# Annotate the plot with average, max, and min trade volumes
ax.annotate(f"Average Trade Volume {ave_y:,.2f}",
xy=(ave_x, ave_y),
xytext=(ave_x, ave_y + 0.1 * ave_y),
bbox=dict(boxstyle="round,pad=0.3", facecolor=palette[5], edgecolor=palette[6]),
arrowprops=dict(arrowstyle="->", color=palette[6]))
ax.annotate(f"Maximum Trade Volume {max_y:,.2f}",
xy=(ave_x, max_y),
xytext=(ave_x, max_y - 0.1 * max_y),
bbox=dict(boxstyle="round,pad=0.3", facecolor=palette[5], edgecolor=palette[6]),
arrowprops=dict(arrowstyle="->", color=palette[6]))
ax.annotate(f"Minimum Trade Volume {min_y:,.2f}",
xy=(ave_x, min_y),
xytext=(ave_x, min_y - 0.1 * min_y),
bbox=dict(boxstyle="round,pad=0.3", facecolor=palette[5], edgecolor=palette[6]),
arrowprops=dict(arrowstyle="->", color=palette[6]))
# Improve the layout
plt.tight_layout()
plt.show()
# Example usage
trade_vol(nvda_df)
trade_vol(tsla_df)
trade_vol(aapl_df)
trade_vol(amzn_df)
trade_vol(f_df)
trade_vol(amd_df)
trade_vol(bac_df)
trade_vol(t_df)
trade_vol(aal_df)
trade_vol(ccl_df)
- Visible Horizatinal line for Max
In [15]:
list_df = [nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df]
# loop through the the list_df to find mini and maxi of each stocks
mini = [df[df['date'] == df['date'].min()]['close'].values.item() for df in list_df]
maxi = [df[df['date'] == df['date'].max()]['close'].values.item() for df in list_df]
# find list of abosolute difference between both stock price
diff = np.array(maxi) - np.array(mini)
# find the percentage growth
growth = (diff/mini)*100
growth_list = growth.tolist()
co_name_list = [df['co_name'].unique()[0] for df in list_df]
# visualize the growth of the stocks
fig, ax = plt.subplots(figsize=(13,7))
ax.barh(y=co_name_list, width=growth_list, height=0.9, color=['#4bd659','#4bd659','#4bd659','#4bd659','#4bd659',
'#4bd659','#4bd659','#ed615c','#ed615c','#ed615c'],
edgecolor='#713ae8')
for p in ax.patches:
ax.annotate(f'{round(p.get_width(),2)}%', (p.get_width()+15, p.get_y() +0.3))
ax.set_xlabel('Percentage growth in stock price')
ax.set_ylabel('Name of companies')
ax.set_title("Growth in stock price over a period of 5 years")
plt.show()
Comparitive Analysis of of Top Tech Stocks in the Index¶
In [16]:
# function to return top 10 sub dataframe
def subdataframe(df, tick):
# top 10 ticks
ticks = list(top10_s.index)
assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
ndf = new_df[new_df['ticks'] == tick]
return ndf
In [17]:
ab = sp500Cmp_info_df[sp500Cmp_info_df['Industry'] == 'Software - Infrastructure']
ab=ab.sort_values(by='Marketcap',ascending=False)
ab=ab.head(6)
ab
Out[17]:
| Exchange | Symbol | Shortname | Longname | Sector | Industry | Currentprice | Marketcap | Ebitda | Revenuegrowth | City | State | Country | Fulltimeemployees | Longbusinesssummary | Weight | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | NMS | MSFT | Microsoft Corporation | Microsoft Corporation | Technology | Software - Infrastructure | 408.90 | 3039369887744 | 1.294330e+11 | 0.152 | Redmond | WA | United States | 228000.0 | Microsoft Corporation develops and supports so... | 0.059346 |
| 20 | NYQ | ORCL | Oracle Corporation | Oracle Corporation | Technology | Software - Infrastructure | 140.75 | 387887300608 | 2.122700e+10 | 0.033 | Austin | TX | United States | 159000.0 | Oracle Corporation offers products and service... | 0.007574 |
| 28 | NMS | ADBE | Adobe Inc. | Adobe Inc. | Technology | Software - Infrastructure | 575.25 | 255065849856 | 7.844000e+09 | 0.102 | San Jose | CA | United States | 29945.0 | Adobe Inc., together with its subsidiaries, op... | 0.004980 |
| 87 | NMS | PANW | Palo Alto Networks, Inc. | Palo Alto Networks, Inc. | Technology | Software - Infrastructure | 346.15 | 112533364736 | 9.601666e+08 | 0.121 | Santa Clara | CA | United States | NaN | Palo Alto Networks, Inc. provides cybersecurit... | 0.002197 |
| 130 | NMS | SNPS | Synopsys, Inc. | Synopsys, Inc. | Technology | Software - Infrastructure | 477.53 | 73355296768 | 1.652390e+09 | 0.127 | Sunnyvale | CA | United States | 20300.0 | Synopsys, Inc. provides electronic design auto... | 0.001432 |
| 149 | NMS | CRWD | CrowdStrike Holdings, Inc. | CrowdStrike Holdings, Inc. | Technology | Software - Infrastructure | 259.32 | 63566077952 | 1.874900e+08 | 0.317 | Austin | TX | United States | 9219.0 | CrowdStrike Holdings, Inc. provides cybersecur... | 0.001241 |
In [18]:
company_dict = {
'MSFT': 'Microsoft',
'ORCL': 'Oracle Corporation',
'ADBE': 'Adobe Inc.',
'PANW': 'Palo Alto Networks',
'SNPS': 'Synopsys Inc.',
'CRWD': 'CrowdStrike Holdings'
}
In [19]:
def subdataframe(df, tick, *ticks):
ticks_list = list(ticks)
assert tick in ticks_list, f"Stock tick does not belong to the provided list of ticks: {ticks_list}"
ndf = df[df['ticks'] == tick]
return ndf
TechStocks = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Create individual DataFrames for each stock
MSFT_df = subdataframe(new_df, 'MSFT', *TechStocks)
ORCL_df = subdataframe(new_df, 'ORCL', *TechStocks)
ADBE_df = subdataframe(new_df, 'ADBE', *TechStocks)
PANW_df = subdataframe(new_df, 'PANW', *TechStocks)
SNPS_df = subdataframe(new_df, 'SNPS', *TechStocks)
CRWD_df = subdataframe(new_df, 'CRWD', *TechStocks)
# Apply the dailyfunc function to each DataFrame
MSFT_df = dailyfunc(MSFT_df)
ORCL_df = dailyfunc(ORCL_df)
ADBE_df = dailyfunc(ADBE_df)
PANW_df = dailyfunc(PANW_df)
SNPS_df = dailyfunc(SNPS_df)
CRWD_df = dailyfunc(CRWD_df)
In [21]:
background_gradient = LinearSegmentedColormap.from_list("", ["#212121", "#1A1D23", "#03055B"])
line_colors = ['#FFFF00', '#00BFFF', '#32CD32', '#FF00FF', '#FFA500', '#d62728']
# Create the figure and axis with dark background
fig, ax = plt.subplots(figsize=(14, 7))
fig.patch.set_facecolor('#1A1D23') # Set figure background color to match theme
ax.set_facecolor('#212121') # Set axis background color
# Plot each stock's daily mean price with distinct colors and line widths for visibility on dark theme
ax.plot(MSFT_df['date'], MSFT_df['daily_mean'], label='Microsoft (MSFT)', color=line_colors[0], lw=1.4)
ax.plot(ORCL_df['date'], ORCL_df['daily_mean'], label='Oracle (ORCL)', color=line_colors[1], lw=1.4)
ax.plot(ADBE_df['date'], ADBE_df['daily_mean'], label='Adobe (ADBE)', color=line_colors[2], lw=1.4)
ax.plot(PANW_df['date'], PANW_df['daily_mean'], label='Palo Alto Networks (PANW)', color=line_colors[3], lw=1.4)
ax.plot(SNPS_df['date'], SNPS_df['daily_mean'], label='Synopsys (SNPS)', color=line_colors[4], lw=1.4)
ax.plot(CRWD_df['date'], CRWD_df['daily_mean'], label='CrowdStrike (CRWD)', color=line_colors[5], lw=1.4)
# Format the x-axis to display years only
ax.xaxis.set_major_locator(mdates.YearLocator()) # Place tick at the start of each year
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Format to show only the year
plt.xticks(rotation=0, color='white') # Make tick labels white for dark background
# Add title and labels with improved formatting
ax.set_title("Comparative Analysis of Tech Stock Prices Based on Mean Price", fontsize=16, fontweight='bold', color='white')
ax.set_ylabel("Daily Average Stock Price", fontsize=14, color='white')
# Add legend with a clear background
ax.legend(facecolor='#1A1D23', fontsize="medium", title="Tech Stocks", title_fontsize=13, labelcolor='white', edgecolor='white')
# Add gridlines for better readability, change to a softer color to match the dark theme
ax.grid(True, color='#444444')
# Show the plot
plt.tight_layout()
plt.show()
Daily Mean = (Open + Close + High + Low) / 4
Volitatliy Analysis¶
In [22]:
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Set up the plot
plt.figure(figsize=(14, 8))
# Loop through each DataFrame to plot only KDE (no histograms)
for df, label in zip(dataframes, labels):
# Plot only KDE for each DataFrame's 'daily return' with a specific color palette for differentiation
sns.kdeplot(df['daily return'], label=label, linewidth=2)
# Add labels and title
plt.title('KDE of Daily Returns for Top 6 Stocks')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.legend(title='Stock')
# Show the plot
plt.show()
In [23]:
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Create a combined DataFrame for plotting
combined_df = pd.DataFrame()
# Loop through dataframes and calculate standard deviation of 'daily return'
std_devs = {}
for df, label in zip(dataframes, labels):
# Create a temporary DataFrame with a 'Stock' column
temp_df = df[['daily return']].copy()
temp_df['Stock'] = label
# Append to the combined DataFrame
combined_df = pd.concat([combined_df, temp_df])
# Calculate the standard deviation for this stock and store it
std_devs[label] = temp_df['daily return'].std()
# Sort stocks by their standard deviation (volatility)
sorted_labels = sorted(std_devs, key=std_devs.get)
# Set up the plot
plt.figure(figsize=(12, 6))
# Create the box plot, sorted by volatility
sns.boxplot(x='Stock', y='daily return', data=combined_df, order=sorted_labels, palette="Set2")
# Add labels and title
plt.title('Box Plot of Daily Returns for Top 6 Stocks (Sorted by Volatility)')
plt.ylabel('Daily Return')
# Show the plot
plt.show()
Risk Analysis¶
How much value do we put at risk by investing in a particular stock?
In [24]:
means = []
std_devs = []
# Loop through dataframes and calculate mean and standard deviation for each stock
for df in dataframes:
mean_return = df['daily return'].mean()
std_dev = df['daily return'].std()
means.append(mean_return)
std_devs.append(std_dev)
# Define size and color for scatter plot
area = np.pi * 20 # Size of the scatter plot marker
colors = ['blue', 'green', 'red', 'purple', 'orange', 'brown'] # Colors for each stock
# Create a scatter plot
plt.figure(figsize=(12, 8))
# Plot each stock with a different color and annotate
for mean, std, label, color in zip(means, std_devs, labels, colors):
plt.scatter(mean, std, s=area, c=color, label=label)
plt.annotate(label, xy=(mean, std), xytext=(50, 50), textcoords='offset points',
ha='right', va='bottom',
arrowprops=dict(arrowstyle='-', color=color, connectionstyle='arc3,rad=-0.3'))
# Add labels, title, and legend
plt.xlabel('Expected Return (Mean)')
plt.ylabel('Risk (Standard Deviation)')
plt.title('Scatter Plot of Risk vs Expected Return for Top 6 Stocks')
# Show the plot
plt.show()
In [25]:
SNPS_df
Out[25]:
| date | ticks | adj close | close | high | low | open | volume | daily return | daily_mean | co_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 540432 | 2019-09-16 | SNPS | 133.240005 | 133.240005 | 134.149994 | 132.529999 | 133.619995 | 1150600.0 | -0.284381 | 133.384998 | Synopsys Inc. |
| 540433 | 2019-09-17 | SNPS | 136.699997 | 136.699997 | 136.880005 | 132.509995 | 133.050003 | 1028300.0 | 2.743325 | 134.785000 | Synopsys Inc. |
| 540434 | 2019-09-18 | SNPS | 136.500000 | 136.500000 | 137.070007 | 133.639999 | 136.660004 | 978700.0 | -0.117082 | 135.967503 | Synopsys Inc. |
| 540435 | 2019-09-19 | SNPS | 137.130005 | 137.130005 | 138.729996 | 136.460007 | 136.669998 | 777300.0 | 0.336582 | 137.247501 | Synopsys Inc. |
| 540436 | 2019-09-20 | SNPS | 135.720001 | 135.720001 | 137.990005 | 135.630005 | 137.110001 | 1196700.0 | -1.013784 | 136.612503 | Synopsys Inc. |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 541678 | 2024-08-28 | SNPS | 509.980011 | 509.980011 | 523.799988 | 506.329987 | 522.909973 | 1082800.0 | -2.472694 | 515.754990 | Synopsys Inc. |
| 541679 | 2024-08-29 | SNPS | 517.020020 | 517.020020 | 526.429993 | 513.429993 | 514.919983 | 932400.0 | 0.407837 | 517.949997 | Synopsys Inc. |
| 541680 | 2024-08-30 | SNPS | 519.580017 | 519.580017 | 529.000000 | 514.440002 | 525.530029 | 1233500.0 | -1.132193 | 522.137512 | Synopsys Inc. |
| 541681 | 2024-09-03 | SNPS | 481.220001 | 481.220001 | 514.400024 | 479.089996 | 514.150024 | 2561000.0 | -6.404750 | 497.215012 | Synopsys Inc. |
| 541682 | 2024-09-04 | SNPS | 477.529999 | 477.529999 | 483.399994 | 472.600006 | 475.658508 | 1892187.0 | 0.393453 | 477.297127 | Synopsys Inc. |
1251 rows × 11 columns
Moving Average¶
A very basic Measure to smothen out the Noise is using moving averages
- it's speciality Lies in using it's Golden Cross and Death Cross features in it
So we are going to use comapative comaises and comapre them with each other now
In [26]:
PANW_df
Out[26]:
| date | ticks | adj close | close | high | low | open | volume | daily return | daily_mean | co_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 454113 | 2019-09-16 | PANW | 71.933334 | 71.933334 | 72.436668 | 69.243332 | 69.339996 | 4389000.0 | 3.740032 | 70.738333 | Palo Alto Networks |
| 454114 | 2019-09-17 | PANW | 70.896667 | 70.896667 | 71.823334 | 69.796669 | 71.496666 | 3570000.0 | -0.839198 | 71.003334 | Palo Alto Networks |
| 454115 | 2019-09-18 | PANW | 70.063332 | 70.063332 | 71.000000 | 69.139999 | 70.883331 | 4560000.0 | -1.156830 | 70.271666 | Palo Alto Networks |
| 454116 | 2019-09-19 | PANW | 69.629997 | 69.629997 | 70.266670 | 69.063332 | 69.720001 | 4064400.0 | -0.129093 | 69.670000 | Palo Alto Networks |
| 454117 | 2019-09-20 | PANW | 69.416664 | 69.416664 | 70.860001 | 69.059998 | 69.666664 | 2958300.0 | -0.358852 | 69.750832 | Palo Alto Networks |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 455359 | 2024-08-28 | PANW | 350.440002 | 350.440002 | 352.779999 | 346.079987 | 349.470001 | 1653300.0 | 0.277564 | 349.692497 | Palo Alto Networks |
| 455360 | 2024-08-29 | PANW | 358.209991 | 358.209991 | 366.049988 | 352.220001 | 352.920013 | 2728500.0 | 1.498917 | 357.349998 | Palo Alto Networks |
| 455361 | 2024-08-30 | PANW | 362.720001 | 362.720001 | 362.970001 | 354.570007 | 358.420013 | 2988600.0 | 1.199706 | 359.670006 | Palo Alto Networks |
| 455362 | 2024-09-03 | PANW | 355.070007 | 355.070007 | 365.500000 | 352.950012 | 359.549988 | 2613800.0 | -1.245997 | 358.267502 | Palo Alto Networks |
| 455363 | 2024-09-04 | PANW | 346.149994 | 346.149994 | 348.040009 | 338.089996 | 343.299988 | 3013721.0 | 0.830179 | 343.894997 | Palo Alto Networks |
1251 rows × 11 columns
In [27]:
dataframes = [PANW_df, CRWD_df]
labels = ['PANW', 'CRWD']
PANW_df['date'] = pd.to_datetime(PANW_df['date'])
CRWD_df['date'] = pd.to_datetime(CRWD_df['date'])
PANW_df.set_index('date', inplace=True)
CRWD_df.set_index('date', inplace=True)
In [ ]:
In [28]:
short_window = 50
long_window = 200
# Loop through companies and calculate moving averages
for company, label in zip([PANW_df, CRWD_df], ['PANW', 'CRWD']):
company['Short MA'] = company['adj close'].rolling(window=short_window).mean()
company['Long MA'] = company['adj close'].rolling(window=long_window).mean()
# Detect Golden Cross and Death Cross
company['Signal'] = 0.0 # Initialize signal column
company['Signal'] = np.where(company['Short MA'] > company['Long MA'], 1.0, 0.0)
company['Cross'] = company['Signal'].diff()
# Function to plot stock with Golden Cross and Death Cross
def plot_stock_with_cross(company, name, ax):
ax.plot(company['adj close'], label='Adjusted Close', alpha=0.6)
ax.plot(company['Short MA'], label=f'{short_window}-day MA', alpha=0.7)
ax.plot(company['Long MA'], label=f'{long_window}-day MA', alpha=0.7)
# Mark Golden Cross (1) and Death Cross (-1)
ax.plot(company[company['Cross'] == 1].index, company['Short MA'][company['Cross'] == 1], '^', markersize=10, color='g', lw=0, label='Golden Cross')
ax.plot(company[company['Cross'] == -1].index, company['Short MA'][company['Cross'] == -1], 'v', markersize=10, color='r', lw=0, label='Death Cross')
ax.set_title(name)
ax.legend()
# Set major ticks to every year (from 2019 to 2024)
ax.xaxis.set_major_locator(mdates.YearLocator()) # Every year
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Format as 'Year'
# Ensure x-axis spans from 2019 to 2024
ax.set_xlim([pd.Timestamp('2019-01-01'), pd.Timestamp('2024-12-31')])
# Set up plot with subplots for PANW and CRWD
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))
# Plot for PANW and CRWD
plot_stock_with_cross(PANW_df, 'PANW', axes[0])
plot_stock_with_cross(CRWD_df, 'CRWD', axes[1])
plt.tight_layout()
plt.show()
- CrowdStrike had faced 2 Death cross one of them is in Year 2024 , I think we have in idea why and when it did that happen
-- We will Analyze the most Recent Event that could Have caused the strike
In [29]:
check = CRWD_df[CRWD_df['Cross'] == -1]
check
Out[29]:
| ticks | adj close | close | high | low | open | volume | daily return | daily_mean | co_name | Short MA | Long MA | Signal | Cross | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||
| 2022-01-04 | CRWD | 189.190002 | 189.190002 | 197.445007 | 183.320007 | 195.889999 | 6503400.0 | -3.420285 | 191.461254 | CrowdStrike Holdings | 235.3460 | 236.554925 | 0.0 | -1.0 |
| 2024-09-03 | CRWD | 266.600006 | 266.600006 | 277.510010 | 265.290009 | 275.779999 | 5485800.0 | -3.328738 | 271.295006 | CrowdStrike Holdings | 299.4642 | 300.198974 | 0.0 | -1.0 |
- It looks like , Our Data is working as Crowdstrike did indeed had a death cross ,
- Full Article

Market and Equity Analysis of Stocks¶
In [32]:
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']
holders_data=[]
for symbol in symbols:
# Fetch the stock data
stock = yf.Ticker(symbol)
# Get historical market data
hist = stock.history(start=start_date, end=end_date)
# Get financial data
info = stock.info
financials = stock.financials
balance_sheet = stock.balance_sheet
cash_flow = stock.cashflow
# Try to fetch major holders, handle if not available
try:
major_holders = stock.major_holders
except:
major_holders = None
# Try to fetch institutional holders, handle if not available
try:
institutional_holders = stock.institutional_holders
except:
institutional_holders = None
holders_data.append({
'Stock Symbol': symbol, # Stock symbol
'Major Holders': major_holders, # Major holders data
'Top 10 Institutional Holders': institutional_holders # Top 10 institutional holders data
})
Holders_data_df=pd.DataFrame(holders_data)
Holders_data_df.set_index('Stock Symbol', inplace=True)
# Create a DataFrame with the desired information
df = pd.DataFrame({
'Date': hist.index,
'Open': hist['Open'],
'High': hist['High'],
'Low': hist['Low'],
'Close': hist['Close'],
'Volume': hist['Volume'],
'EBITDA': info.get('ebitda', None),
'EPS': info.get('trailingEps', None),
'PE_Ratio': info.get('trailingPE', None),
'PB_Ratio': info.get('priceToBook', None),
'Dividend_Yield': info.get('dividendYield', None),
'Market_Cap': info.get('marketCap', None),
'Revenue': financials.loc['Total Revenue'].values[0] if not financials.empty else None,
'Net_Income': financials.loc['Net Income'].values[0] if not financials.empty else None,
'Total_Assets': balance_sheet.loc['Total Assets'].values[0] if not balance_sheet.empty else None,
'Total_Liabilities': balance_sheet.loc['Total Liabilities Net Minority Interest'].values[0] if not balance_sheet.empty else None,
'Operating_Cash_Flow': cash_flow.loc['Operating Cash Flow'].values[0] if not cash_flow.empty else None,
})
print("Done")
Done Done Done Done Done Done Done Done
Equity Distribution By Top 10 Institutional StakeHolders¶
In [33]:
def create_pie_charts(df):
fig, axes = plt.subplots(1, 2, figsize=(12, 6)) # Create a 1x2 grid for the pie charts
# Iterate through each stock symbol
for idx, symbol in enumerate(df.index):
# Extract top 10 institutional holders' data for the current stock symbol
institutional_holders = df.loc[symbol, 'Top 10 Institutional Holders']
# If the institutional holders' data is available
if institutional_holders is not None:
# Get the names of the top 10 holders and their corresponding positions (values)
names = institutional_holders['Holder'].values[:10]
values = institutional_holders['Shares'].values[:10]
axes[idx].pie(values, labels=names, autopct='%1.1f%%', startangle=90)
axes[idx].set_title(f'Top 10 Institutional Holders for {symbol}')
plt.tight_layout()
plt.show()
create_pie_charts(Holders_data_df)
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[33], line 21 18 plt.tight_layout() 19 plt.show() ---> 21 create_pie_charts(Holders_data_df) Cell In[33], line 15, in create_pie_charts(df) 12 names = institutional_holders['Holder'].values[:10] 13 values = institutional_holders['Shares'].values[:10] ---> 15 axes[idx].pie(values, labels=names, autopct='%1.1f%%', startangle=90) 16 axes[idx].set_title(f'Top 10 Institutional Holders for {symbol}') 18 plt.tight_layout() IndexError: index 2 is out of bounds for axis 0 with size 2
- it's quite intresting to note , How these Big investment firms Owns , Very Similar Shares in Both Tech Giants
Next we are Going to do some Analysis , Selecting do a Side by Side Comparision on Some Direct Alternates of Each Other as¶
| S&P Index Comp | alternate 1 | alternate 2 | Stock Symbols |
|---|---|---|---|
| Google (Alphabet) | Microsoft | GOOG/GOOGL, MSFT | |
| Walmart | Amazon | Alibaba | WMT, AMZN, BABA |
| Oracle | SAP | Salesforce | ORCL, SAP, CRM |
In [ ]:
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']
- we were not able to utilze
import pandas as pd
STOCK_URLS = {
'AAPL': 'https://www.macrotrends.net/stocks/charts/AAPL/apple/pe-ratio',
'AMD': 'https://www.macrotrends.net/stocks/charts/AMD/amd/pe-ratio',
'NVDA': 'https://www.macrotrends.net/stocks/charts/NVDA/nvidia/pe-ratio',
'MSFT': 'https://www.macrotrends.net/stocks/charts/MSFT/microsoft/pe-ratio'
}
# Initialize empty DataFrames
dfs = {stock: pd.DataFrame() for stock in STOCK_URLS.keys()}
# Loop through each stock and URL
for stock, url in STOCK_URLS.items():
data = pd.read_html(url, skiprows=1)
df = pd.DataFrame(data[0])
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
df = df[1:]
df = df.rename(columns={0: 'Date', 1: 'Price', 2:'EPS', 3:'PE ratio'})
df['EPS'][1:] = ''
df.set_index('Date', inplace=True)
df = df.sort_index()
df['trend'] = ''
df['PE ratio'] = df['PE ratio'].astype(float)
In [4]:
file_paths = {
'AMD': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\AMD PE Ratio 2010-2024.txt",
'Apple': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Apple PE Ratio 2010-2024.txt",
'Microsoft': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Microsoft PE Ratio 2010-2024.txt",
'NVIDIA': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\NVIDIA PE Ratio 2010-2024.txt"
}
def clean_eps(value):
if pd.isna(value) or value == '':
return None
return float(value.replace('$', ''))
dfs = {}
for stock, file_path in file_paths.items():
df = pd.read_csv(file_path, sep=r'\s+', header=None, names=['Date', 'Price', 'EPS', 'PE ratio'])
df['EPS'] = df['EPS'].apply(clean_eps)
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
df['PE ratio'] = pd.to_numeric(df['PE ratio'], errors='coerce')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df = df.iloc[1:]
df.sort_index(inplace=True, ascending=False)
dfs[stock] = df
In [5]:
# Extract the EPS columns from the DataFrames
eps_values = [df['EPS'] for df in dfs.values()]
fig = go.Figure()
# Add lines for each stock
for stock, eps in zip(dfs.keys(), eps_values):
fig.add_trace(go.Scatter(x=eps.index, y=eps.values, name=stock, line=dict(width=2)))
# Set title and labels
fig.update_layout(
title='EPS Over Time',
xaxis_title='Date',
yaxis_title='EPS',
plot_bgcolor='#333333',
paper_bgcolor='#333333',
font_color='#ffffff'
)
# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))
# Show the legend
fig.update_layout(showlegend=True)
fig.update_layout(
width=600, # Set the figure width (in pixels)
height=300, # Set the figure height (in pixels)
)
# Show the plot
fig.show()
- Microsoft Apple Continues
In [6]:
pe_ratios = [df['PE ratio'] for df in dfs.values()]
stock_symbols = list(dfs.keys())
# Create the figure
fig = go.Figure()
# Add lines for all stocks, using the stock symbols from the dictionary keys
for i, stock in enumerate(stock_symbols):
fig.add_trace(go.Scatter(x=pe_ratios[i].index, y=pe_ratios[i].values, name=stock, line=dict(width=2), visible=(i < 2)))
# Set title and labels
fig.update_layout(
title='PE Ratio Over Time',
xaxis_title='Date',
yaxis_title='PE Ratio',
plot_bgcolor='#333333',
paper_bgcolor='#333333',
font_color='#ffffff'
)
# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))
# Add dropdown menu
buttons = [
dict(label=f'{stock_symbols[0]} & {stock_symbols[3]}', method='update', args=[{'visible': [True, False, False, True]}]),
dict(label=f'{stock_symbols[1]} & {stock_symbols[2]}', method='update', args=[{'visible': [False, True, True, False]}]),
dict(label='All 4', method='update', args=[{'visible': [True, True, True, True]}])
]
fig.update_layout(updatemenus=[dict(buttons=buttons, direction='down', showactive=True)])
# Show the legend
fig.update_layout(showlegend=True)
# Show the plot
fig.show()
In tech Normal/Good/IDeal PE Ratio is Quite Difficut to be Define , but on avearage it is between 15-16 , and for Matruee comapnirs it can be larger in 20-25 range source here
- In September 2023 , AMD seem to had a Huge Pe Ratio of 856.83 and is has quite volitoatliy .
- Nividia had realtivle Less Varitaion maintained it's Pe Ratio , even Which Still larger than Average pe
In [ ]: